ChatGPT for digital pathology research

The rapid evolution of generative artificial intelligence (AI) models including OpenAI’s ChatGPT signals a promising era for medical research. In this Viewpoint, we explore the integration and challenges of large language models (LLMs) in digital pathology, a rapidly evolving domain demanding intricate contextual understanding. The restricted domain-specific efficiency of LLMs necessitates the advent of tailored AI tools, as illustrated by advancements seen in the last few years including FrugalGPT and BioBERT. Our initiative in digital pathology emphasises the potential of domain-specific AI tools, where a curated literature database coupled with a user-interactive web application facilitates precise, referenced information retrieval. Motivated by the success of this initiative, we discuss how domain-specific approaches substantially minimise the risk of inaccurate responses, enhancing the reliability and accuracy of information extraction. We also highlight the broader implications of such tools, particularly in streamlining access to scientific research and democratising access to computational pathology techniques for scientists with little coding experience. This Viewpoint calls for an enhanced integration of domain-specific text-generation AI tools in academic settings to facilitate continuous learning and adaptation to the dynamically evolving landscape of medical research.

information provided in the retrieved documents and may not be exhaustive of all research papers in the field.
Compiling a comprehensive and up-to-date list of all research papers that utilize deep learning algorithms on whole slide images (WSIs) for predicting molecular alterations is a complex task that requires access to multiple scientific databases and continuous monitoring of new publications.However, I can highlight several key studies in this domain to give you an idea of the landscape.This list will include some seminal works and recent impactful papers, but it is by no means exhaustive.For the most current research, I recommend searching databases like PubMed, Google Scholar, and arXiv.Prompt: What molecular alterations were the focus of the study titled "Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images"

Ours (GPT4DFCI-RAG) ChatGPT 4
The study titled "Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images" focused on predicting the status of molecular pathways and key mutations in colorectal cancer.The specific molecular alterations that were the focus of the study included microsatellite instability (MSI), hypermutation density, chromosomal instability, CpG island methylator phenotype (CIMP) high prediction, BRAF mutation, and TP53 mutation 1 .These molecular features are significant as they can influence the prognosis and treatment strategies for colorectal cancer patients.
The study titled "Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images" primarily focused on leveraging deep learning to predict the status of key molecular pathways and specific mutations in colorectal cancer (CRC) based on routine hematoxylin and eosin (H&E) stained histology images.Although I cannot provide specific details from the study without accessing it directly, typically, studies like this aim to predict alterations in several key molecular pathways and mutations known to be significant in the pathogenesis and treatment response of colorectal cancer.The study likely used a weakly supervised learning approach because obtaining large datasets with detailed annotations of molecular pathway statuses and specific mutations from histology images can be challenging.Instead, this approach allows the model to learn from less-specific labels, such as the overall presence or absence of certain mutations or pathway activations inferred from clinical or genomic data, to make predictions directly from the histology images.
For precise information about which molecular alterations were the focus of the study and how the deep learning framework was developed and validated, the original article would need to be consulted.Researchers in digital pathology and computational biology, such as yourself, might find this study's methodology and findings particularly relevant for advancing the field of predictive modeling in oncology.Prompt: List the available visual-language foundation models available for histopathology WSIs?

Prompt
Ours (GPT4DFCI-RAG) ChatGPT 4 The available visual-language foundation models for histopathology Whole Slide Images (WSIs) mentioned in the retrieved documents include: 1. CONtrastive learning from Captions for Histopathology (CONCH) -A visual-language foundation model developed using diverse sources of histopathology images, biomedical text, and over 1.17 million image-caption pairs via taskagnostic pretraining 1 .2. PLIP -A visual-language foundation model mentioned as a concurrent system to CONCH 2 .3. BiomedCLIP -Another visual-language foundation model that is compared with CONCH in the documents 2 .4. OpenAICLIP -A general visual-language model that has been applied to histopathology and is referenced in the documents 2 .
These models are designed to be transferred to a wide range of downstream tasks involving either or both histopathology images and text, achieving state-of-theart performance on various histology image classification, segmentation, captioning, text-to-image, and image-to-text retrieval tasks.
Visual-language foundation models designed for analyzing histopathology whole slide images (WSIs) represent a growing area of interest in digital pathology, leveraging the advances in AI to interpret complex biological data.Here are some notable models and approaches that integrate visual (image) and language (textual annotation or description) data for the analysis of histopathology images: 1. CLIP (Contrastive Language-Image Pre-training): • While CLIP itself is not specifically tailored for histopathology, its architecture, which jointly learns from images and text through a contrastive loss, has inspired adaptations in the medical imaging field.For instance, researchers can train modified versions of CLIP on paired images and textual pathology reports to improve the model's ability to understand and annotate histopathological features.

BioViL (Biomedical Vision-Language):
• BioViL is an adaptation of the vision-language model specifically for biomedical contexts.It can be pre-trained on a large dataset of medical images and corresponding annotations, then fine-tuned for specific tasks like disease detection or progression in histopathology slides.

Pathomic Fusion:
• This model focuses on integrating multimodal data, including genetic information (omic data) and histopathology images, using a fusion approach to predict clinical outcomes more accurately.While not strictly a visual-language model, it exemplifies the integration of different types of biological data, akin to combining image and text.

Domain-specific adaptations of BERT (Bidirectional Encoder
Representations from Transformers) for pathology reports: • BERT and other transformer-based models have been adapted to process pathology reports.These models can be combined with image analysis models (like CNNs) in a dual-stream architecture to enhance performance on tasks like tumor classification or mutation prediction from WSIs.

Custom Transformer Models:
• Researchers have started to develop transformer-based models that directly incorporate both histopathology images and associated textual data (like clinical notes or pathology reports) to perform tasks such as diagnosis prediction, prognosis estimation, or even gene expression profiling.
These models typically require substantial datasets with well-annotated images and corresponding text data for training.The success of these systems depends heavily on the quality and quantity of the dataset, as well as the specificity of the annotations provided.In your case, developing or adapting a visual-language model could involve training on datasets comprising WSIs and detailed annotations describing the histopathological features and clinical information pertinent to prostate cancer.
Prompt: How did CONCH compare to the other visual-language foundation models in terms of accuracy?
Ours (GPT4DFCI-RAG) ChatGPT 4 CONCH (CONtrastive learning from Captions for Histopathology) outperformed other visual-language foundation models in terms of accuracy across various tasks.On slide-level tasks, CONCH achieved a balanced accuracy score of 84.7%, and an average accuracy of 90.5% across three tasks, which was higher than PLIP and BiomedCLIP, which had 86.6% and 87.9% respectively, although no statistical significance was detected for each task 1 .
In Please note that some datasets are mentioned in the context of the challenges associated with nuclei annotation and may not be primarily focused on nuclei annotation but include it as part of their data.
When it comes to datasets with nuclei annotations, particularly useful for tasks in digital pathology and computational biology, several notable ones are available across various platforms.
Here's a list of some widely-used datasets: 1.The Cancer Genome Atlas (TCGA): While primarily known for its genomic data, TCGA also includes histopathological images (H&E stained) that often come with expert annotations, including nuclei.
2. The Cancer Imaging Archive (TCIA): Similar to TCGA, TCIA offers a variety of imaging data, including annotated histological images which can be useful for nuclei studies.

MoNuSeg (Multi-organ Nuclei Segmentation Challenge):
Specifically designed for nuclei segmentation, this dataset includes annotated nuclei from multiple organs, derived from both TCGA and TCIA.
4. The Kumar dataset: Released as part of a challenge for nuclei segmentation, it includes images from multiple tissue types with detailed nuclei annotations.

GlaS@MICCAI2015 (Gland Segmentation in Colon
Histology Images): Though focused on gland segmentation, this dataset includes detailed structure annotations that can indirectly aid in nuclei annotation studies.

Lung and Colon Cancer Histopathological Images (LC25000):
This dataset includes labeled images of lung and colon cancers, where nuclei annotations can be indirectly inferred through cancer cell annotations.
7. BraTS (Brain Tumor Segmentation Challenge): While primarily brain tumor images, some datasets include detailed annotations that can help with nuclei studies, especially in tumor regions.
These datasets are often used in research for developing and benchmarking algorithms for tasks like nuclei detection, segmentation, and classification in various tissue types.Each dataset comes with its own set of rules and accessibility options, so it's important to check their respective websites for usage guidelines and licensing information.
Prompt: Give a brief description of the PANDA challenge dataset  After completing these steps, PathML should be installed in your Conda environment.You can start using it for your digital pathology analysis projects.
Prompt: How can I create an imaging processing transform that is compatible with PathML?
To create an imaging processing transform that is compatible with PathML, you'll want to define a custom transform class that inherits from PathML's base transform class.This involves implementing the __call__ method, where you define the processing logic for your custom transform.Below is a general template to guide you through creating a custom imaging processing transform in PathML: In this template, CustomTransform is your new transform class that applies your specific imaging processing algorithm to a SlideData object.The __call__ method is where the actual processing happens, and it must take a SlideData object as input and return the modified SlideData object.Here's a more concrete example, demonstrating a hypothetical transform that applies a simple Gaussian blur to the image data: Seminal and Recent Impactful Papers: 1. "Deep learning-based transformation of H&E stained tissues into special stains" by Bayramoglu, N., & Heikkilä, J. (2016).This paper discusses the use of deep learning for transforming H&E stained tissue images into images that resemble those stained with other chemicals, indicating an early exploration of deep learning in histopathology be anything from filtering, segmentation, normalization, etc. # You can access the image data using slide.slide,and you should modify this array # or create a new one based on your processing needs # After processing, make sure to update the slide object as needed # For example, if you create a new processed image, you might do: # slide.slide= processed_image # Return the modified slide object return slide .slide is a numpy array of the image # Apply Gaussian Blur to each tile in the slide for tile in slide.tiles.values():tile.image= cv2.GaussianBlur(tile.image,self.kernel_size,self.sigmaX)return slide This GaussianBlurTransform class applies a Gaussian blur to each tile's image in the slide.You can customize the kernel size and sigmaX value for the blur through the class constructor.To use this custom transform in a PathML pipeline, you would do something like this: (filepath="path/to/your/slide.svs",name="ExampleSlide") # Run the pipeline on your slide processed_slide = slide.run(pipeline)

). This study illustrates how integrating deep learning-based analysis of WSIs with genomic data can improve survival prediction in breast cancer patients, indicating a move towards more holistic approaches in cancer prognosis.
For the latest papers, you should use specific search queries related to your field of interest, like "deep learning whole slide images molecular alterations" in scientific databases.Furthermore, tools like Google Scholar's alert function can help you stay up-to-date with newly published research in this rapidly evolving field.

: List research papers that used deep learning models for deriving Gleason score from WSIs of prostate cancer Ours (GPT4DFCI-RAG) ChatGPT 4
Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge" 3 .20. "Going deeper through the gleason scoring scale: An automatic end-to-end system for histology prostate grading and cribriform pattern detection" 3 .21. "Deep learning for automatic gleason pattern classification for grade group determination of prostate biopsies" 3 .22. "Development and validation of an artificial intelligencepowered platform for prostate cancer grading and quantification" 3 .23. "Deep learning-based gleason grading of prostate cancer from histopathology images--role of multiscale decision aggregation and data augmentation" 4 .24. "Multiple Instance Learning with Mixed Supervision in Gleason Grading" by Hao Bian, Zhuchen Shao, Yang Chen, Yifeng Wang, Haoqian Wang, Jian Zhang, and Yongbing Zhang 5 .These papers represent a selection of the research conducted in the field of digital pathology, specifically using deep learning for the Gleason grading of prostate cancer from WSIs.Deep learning models have been increasingly applied to Whole Slide Images (WSIs) of prostate cancer for various diagnostic and prognostic purposes, including the derivation of the Gleason score.The Gleason score is a critical factor in prostate cancer grading, providing insights into the aggressiveness of the cancer based on the histological patterns observed in prostate tissue.Here is a list of research papers that have utilized deep learning models to derive Gleason scores from WSIs: 3 .10. "Machine learning approaches to analyze histological images of tissues from radical prostatectomies" 3 .11. "Automatic gleason grading of prostate cancer using quantitative phase imaging and machine learning" 3 .12. "Detecting prostate cancer using deep learning convolution neural network with transfer learning approach" 3 .13. "Automated gleason grading of prostate cancer tissue microarrays via deep learning" 3 .14. "High-accuracy prostate cancer pathology using deep learning" 3 .15. "Development and validation of a deep learning algorithm for improving gleason scoring of prostate cancer" 3 .16. "Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases" 3 .17. "Deep learning in cancer pathology: a new generation of clinical biomarkers" 3 .18. "Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study" 3 .19. "

. "Deep Learning-Assisted Diagnosis of Prostate Cancer with Gleason Score Prediction on Biopsy Slides" -This paper by Lucas et al. focuses on the application of deep learning models to assist in the diagnosis of prostate cancer and predict Gleason scores from biopsy slides, highlighting the model's diagnostic performance. [IEEE Access, 2019] 6. "A Deep Learning Model to Predict a Diagnosis of Prostate Cancer in Whole-Slide Images of Biopsies" -Pantanowitz et al. investigated the feasibility of using deep learning models to predict prostate cancer diagnosis and provide an estimation of the Gleason score directly from WSIs of prostate biopsies. [JAMA Oncology, 2020]
accuracy and efficiency of prostate cancer diagnosis and grading.They underscore the potential of computational approaches, such as deep learning, to augment traditional pathology methods, offering more precise and reproducible assessments of critical factors like the Gleason score.

The CONCH model, which stands for "CONtrastive learning for Compositional Hypernymy," was introduced in a research paper titled "CONCH: Compositional Hypernymy with Contrastive Learning" in 2021. The primary focus of CONCH is on improving the understanding of compositional hypernymy (a type of word relation where the meaning of one word encompasses the meaning of another, like "animal" and "dog") using contrastive learning techniques. This model is particularly aimed at natural language processing (NLP) tasks related to word relationships and semantic composition. CONCH was not specifically compared against visual- language foundation models since it's more focused on textual semantic relationships rather than multimodal (visual and language) tasks. Visual-language foundation models, such as CLIP by OpenAI or Google's ALIGN, are designed to understand and correlate information across both visual and textual data, which is a broader scope compared to what CONCH aims to address.
Prompt: list

all available histopathology slide datasets with nuclei annotation
The PANDA challenge dataset is a large collection of whole slide images (WSIs) used for the development of deep learningbased systems, particularly for the grading of prostate cancer using the International Society of Urological Pathology (ISUP) grading system.A total of 10,616 WSIs were used in the study, which are available from the Prostate cANcer graDe Assessment (PANDA) challenge.The dataset includes WSIs collected from an unseen test set, with the development set consisting of WSIs obtained from the two institutions mentioned above 1 .The dataset was used in the PANDA challenge, which was a global AI competition organized to catalyze the development of reproducible AI algorithms for Gleason grading of digitized prostate biopsies.The challenge was the largest histopathology competition to date, joined by 1,290 developers, and it was the largest publicly available dataset of prostate biopsies at the time 2 .The PANDA challenge dataset is publicly available for further research, and the most up-to-date information regarding the dataset can be found on the challenge website.Additionally, code used to generate the results of the various algorithms and example code on how to load the images in the PANDA dataset is available online 2 3 .

Table S2 .
Examples of usage of our GPT4-powered PathML Chatbot, designed to provide code and instructions on using PathML for image analysis in response to text prompts.
Prompt: How

do I install PathML on a mac?
To install PathML on a Mac, you will follow these steps.Note that this process involves using the terminal and requires that you have Conda installed.If you do not have Conda, you can download Miniconda here.Here are the detailed steps: 1. Create a Conda environment: Open your terminal and create a new Conda environment specifically for PathML.This environment will isolate the PathML dependencies from other Python packages you may have installed.You can create the environment with Python 3.8 (recommended for compatibility) by running: Activate the Conda environment: Once the environment is created, you need to activate it.You can do this by running: Install external dependencies: Before installing PathML, you must install its external dependencies.On a Mac, this primarily involves installing OpenSlide.You can do this using Homebrew, a package manager for macOS.If you do not have Homebrew installed, you can find installation instructions on the Homebrew website.Install OpenSlide by running: conda install openjdk==8.0.152 5. Install PathML: Finally, with all dependencies in place, you can install PathML using pip.Run the following command to install it: pip install pathml